Rate determining factors in protein model structures 
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Previous research has shown a strong correlation of protein folding rates to the native state 
geometry, yet a complete explanation for this dependence is still lacking. Here we study the rate- 
geometry relationship with a simple statistical physics model, and focus on two classes of model 
geometries, representing ideal parallel and antiparallel structures. We find that the logarithm of 
the rate shows an almost perfect linear correlation with the "absolute contact order", but the slope 
depends on the particular class considered. We discuss these findings in the light of experimental 
results. 



In the last decade many results have been published on 
the relationship between kinetic and structural features 
of protein folding, after the seminal work by Plaxco, Si- 
mons and Baker |l| revealed a simple linear relationship 
between the logarithm of the folding rate and the rela- 
tive contact order (CO) for a set of two-state proteins. 
CO is defined as the sum of the sequence distances | j — i\ 
of any residue pair in contact in the native struc- 
ture (with a " contact" being defined according to a cutoff 
distance), divided by the total number of contacts and 
number of residues. A similar good correlation was ob- 
tained by Jackson considering the absolute contact 
order (ACO, equal to CO times the number of residues) 
and using a slightly extended protein set. Galzitskaya et 
al. in Q showed that length, rather than CO, is relevant 
in three-state proteins. The role of the length is also con- 
sidered in [1, H, @, 0| , while the logarithm of the relative 
contact order, together with CO and ACO, was consid- 
ered in and a combination of CO and length in 
The rates have been also related to linear combinations of 
the length and its logarithm [13] , long range order (LRO), 
total contact distance II ill . fraction of short range con- 
tacts [12 ]. cliauishness lljjl . effective contact order flij ]. 
Plotkin and coworkers [l5| discussed the correlation with 
heterogeneities in contact distance and energy, while the 
absolute contact order, and its length dependence, was 
reconsidered in On the basis of the cited (and cer- 
tainly not exhaustive) set of investigations it is difficult 
to reach a definite conclusion about which measures of 
native state properties are most relevant to determine 
the folding rate. Indeed, while experimental rates show 
a clear correlation with the structural parameters pro- 
posed in the above literature, they are always distributed 
with a relevant spread around the theoretical curve, that 
remains to be accounted for. On- and off-lattice theoret- 
suggest that adding cooperativity 



ical modelling [171 . 
and/or local preferential conformations to Go like models 
improves the correlation of the rates with CO, also in- 
ducing spread in the rate distribution. But also the high 
heterogeneity of the distribution of the sequence-distance 
of the contacts in protein structures could be responsi- 



ble for the rate spread [l5[. For the above reasons, it is 
very important to try to address this issue with model 
structures, in the framework of a simple model with few 
parameters. We focus on two classes of structures, the 
ideal analogues of real parallel and antiparallel secondary 
structures (a-helices and /3-sheets): our aim is to find a 
set of rules that may be later used to rationalize the re- 
lationship between real protein geometries and rates. 

We resort to the Wako-Saito-Muhoz-Eaton (WSME) 
model, which allows an exact solution for the equilibrium 
thermodynamics and a very accurate semi-analytical ap- 
proach to the kinetics. At difference with other native- 
centric models, WSME is intrinsically cooperative and 
accounts for the local preferences of the main-chain. 
The model has already been shown to give good results 
in the determination of rates of real proteins 



their temperature dependence [2l|, the effect of mu- 
tations and mechanical unfolding rates [2^]. The 
WSME model was introduced in the end of the '70s 
fjil . [2H and then forgotten for roughly 20 years, when 



it was indepentently reproposcd by Munoz, Eaton and 



coworkers 
23. 28. 



21 



19 

29 




Several studies followed 
3|,[I1 Ell also with appli- 



cations in a very different subfield of physics [37 , 
WSME is a Go-like model [Io||, i.e. it is "native-centric", 
relying on the knowledge of the native state of a pro- 
tein to describe its equilibrium and kinetics. Its binary 
degrees of freedom are related to the values of the dihe- 
dral angles at the peptide bonds [27j], classified into just 
two states: ordered (native) and disordered (unfolded). 
Since the latter state allows a much larger number of mi- 
croscopic realizations than the former, an entropic cost 
is given to the ordering of a peptide bond. 

The model is described by the effective free energy: 



N-l N 

i=i j=i+i 



N 



(1) 



fe=i 



where N is the number of peptide bonds in the molecule, 
R the ideal gas constant and T the absolute temperature, 
mfe e {0, 1} is the binary variable which tells whether the 



k-th bond, i.e. the one between residues k and k + 1. is in 
the disordered (0) or ordered (1) state, and qk is the cor- 
responding ordering entropic cost. The product Y[k=i m k 
takes value 1 if and only if all the peptide bonds from 
i to j are in the native state, thereby realizing the as- 
sumed interaction. The contact matrix with elements 
Ajj € {0, 1} tells which peptide bonds are at close dis- 
tance in the native state; non-native interactions are dis- 
regarded. The contact map beween peptide bonds, A, j is 
derived from the standard one between residues, Ar„ , as 



thus assigning the (residue i)-(residue j) 
contact to peptide bonds i and (j — 1). A[ ■ is usually cal- 
culated according to some cutoff distance residues i and 
j in the native state; here we deal with ideal secondary 
structures, and will impose A£ • accordingly. Contact 
energy will be taken as e^j = — e throughout the paper, 
unless differently stated. Without loss of generality, we 
can set e = 1. Also the entropic costs will be taken to 
be homogeneous, qk = q = 2 In 2. Several values of these 
entropies have been considered in various works up to 
now, typically based on some fits to experimental data. 
Here we want to consider model structures, so the specific 
value is not too relevant: we take q = 2 In 2, compara- 
ble to the results obtained for various molecules; in the 
following we will discuss the g-dependence of our results. 
Notice that, once fixed e and q, the only source of het- 
erogeneity comes from the contact matrix, i.e., from the 
geometry of the native state. 

We define the denaturation temperature T m asking 
that the average fraction m of ordered bonds is halfway 
between its values mo = 1 (at T=0) and m^ = 

X]fcli(l+ e<i,fc )~ 1 a ^ infinite temperature. In the follow- 
ing, for each structure considered, we will always work at 
the corresponding T m . Exact evaluation [24, 25|of ther- 



modynamic quantities will be performed as in [3J, 

The kinetic evolution of the model is described 
through a discrete-time master equation, p t+ i(m) = 
J2 m ' W(m' — > m)p t (m), for the probability distribution 
Pt(rn) at time i, where m = {m^, k = 1, . . . N} denotes 
the state of the system. The transition matrix W is spec- 
ified by a single bond flip Metropolis rule, as in 
The kinetics will be studied by means of the local equi- 
librium approach (2ll . |36| , where the equilibration rate k 
can be computed as the largest eigenvalue of a matrix 
of rank N(N + l)/2. It has been shown in [HI, |36| that 
this approach turns out to be very accurate when com- 
pared to exact or Monte Carlo results, and the rate so 
obtained is an upper bound of the exact one. Notably, 
this approach allows us to evaluate directly relaxation 
rates, which are the experimentally accessible quantities 
(at difference with folding and unfolding rates), without 
choosing a reaction coordinate. 

We apply the model first to parallel structures (see 
Supplementary Material, Fig. 1), where the sequence dis- 
tance of any interacting pair of residues is constant. This 
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FIG. 1: Logarithm of equilibration rate vs. absolute contact 
order for parallel structures. Symbols explained in text. 



class includes a-helices and parallel /3-sheets. The first 
structural indicator we consider is the ACO, defined as 



ACO = iV- 1 A '. J (j- ! + 1 ) 

l<i<j<N 



(2) 



where iV c = J2i<i<j<N ^ s the total number of con- 
tacts, and we add 1 to the usual j — i, since our contact 
matrix is defined with reference to peptide bonds, and 
the number of the latter involved in a contact is the cor- 
responding number of residues minus one. 

In Fig. [T] we report the natural logarithm of the equili- 
bration rate at T m for several parallel structures, defined 
as parallel /3-sheets with s strands and r residues per 
strand, where consecutive strands are separated by loops 
of I residues not involved in any contact. In such a struc- 
ture the number of peptide bonds is N ~ s(r + I) — I — 1 
and ACO = r + l; contact matrix elements are Ajj = 1 
if j = i + I + r — 1 and Ajj = otherwise. The a-helix 
corresponds to the case I = 0, ACO = r — 4. For every 
(s, r) pair we consider values of / from up to 8 in order 
to vary the ACO. The effect of dilution is also considered 
(for s = 4, r = 6 only), where this regular structure is 
perturbed by removing contacts with probability 1 — p. 
The value of p, when not otherwise specified, is 1. 

We see that all the results fall almost perfectly on the 
same straight line. A joint linear fit including all data 
yields Ink = 2.6633 — 1.3113 ACO, with correlation co- 
efficient -0.998. Considering different values for the en- 
tropic costs would not change the overall behaviour but 
only the slope. For instance, q = 2 yields a slope -1.73. 

This result means that, according to this model, the 
ACO is of fundamental importance, while the rate cannot 
depend in a relevant way on other measures like the CO 
or the chain length, since each ACO value corresponds 
to several values of CO and total length. LRO, total 
contact distance, the fraction of short range contacts, 
and heterogeneity in contact distance are not applica- 
ble here, since all contacts present the same sequence 
distance. Even the introduction of contact-energy het- 
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FIG. 2: (a) Logarithm of equilibration rate vs. absolute con- 
tact order for antiparallel structures at the denaturation tem- 
perature T m . Symbols explained in text, (b) Absolute value 
of the slope in (a) as a function of s. 



crogcncitics does not affects the rate: considering s = 4, 
r = 7 and I = and 4, and taking tjj uniformly dis- 
tributed in [— e — Ae, — e + Ae] , with Ae e [0, e] , we found 
that the variation of In fc was compatible with the spread 
in Fig. Q] less than 0.02 in absolute value for I = 
(lnfc ~ —5.45 in the uniform case), and less than 0.6 
for I = 4 (lnfc ~ —10.39 in the uniform case). These 
figures hardly change if we consider the interactions as 
product of charges associated to residues, as in (HI, an d 
take random charges, or if we consider a correlated dis- 
order, e.g. strcnghtcning the contacts of a group of three 
consecutive residues, leaving the others unchanged. In 
view of the above result it is worth checking whether this 
almost perfect linear behaviour still holds for structures 
which maintain some regularity, but where the contact 
distance is not constant. We therefore turn to antiparal- 
lel structures, and extend the results reported in [2l| for 
a four-stranded antiparallel /3-sheet, considering sheets 
with s strands and r residues per strand; in the case s = 2 
we have a simple hairpin (see Supplementary Material, 
Fig. 2). Turns corresponds to peptide bonds fcr; contacts 
are established between bonds i and j = 2fcr — i, with 
(fc — l)r + 1 < i < fcr— 1 and fc = 1 , • • • , s — 1 labelling the 
turns; the total number of peptide bonds is N = sr — 1. 
This represents the most connected case: every residue 
but two (in the first and last turn) has at least one con- 
tact. The contact distance varies from 3 to 2r — 1 in steps 
of 2, and hence ACO = r + 1. 

In Fig. Ufa) we report the natural logarithm of the 
equilibration rate at T m as a function of ACO for several 
(s, r) values. It is clearly seen that the almost perfect lin- 
ear correlation is now valid only at fixed strand number 
s (again, slopes would be larger for a larger entropic cost 
q). The absolute value of the slope increases with s and 
appears to tend to a constant for large s. Indeed, a fit to 
an exponential function is made in panel (b) and is al- 



most perfect (|Slopc| = 1.21 — 3.5 exp(— 0.62s), estimated 
variance is 0.00017, correlation coefficient is 0.9996). 

The limiting value of the slope is slightly smaller than 
that obtained for parallel structures, whence we have 
that, at fixed ACO> 4, the hairpin is the fastest struc- 
ture, then the other antiparallel structures come, in order 
of increasing s, and finally we have the parallel structures. 

The exponential dependence of the logarithm of the 
rate on s is quite puzzling: to understand such a behav- 
ior, we study the folding mechanism, as revealed by the 
probabilities of native "strings" Sij, i.e., of configura- 
tions where peptide bonds i and j are unfolded, while 
all those in between them are native. Assuming that 
the relaxation rate can be related to the height of the 
highest barrier along the folding pathway, the analysis 
reveals that the latter is represented by the folding of 
one hairpin, or more precisely of all the residues between 
peptide bond u = (fc — l)r and v = (fc + l)r + 1, be- 
ing u and v unfolded, for any fc between 1 and (s — 1). 
The probability of such a configuration can be esti- 
mated as p(S U)V ) = exp(/3 m e(r — 1) — 2rq)/Z UtV , where 
fim = l/(RT m ). Z u<v is the partition function restricted 
to the region between u and v. It can be estimated as 

Zuv = A 2r-l +Ae /3 m e(r-l)-(2r-l) 9 
r-2 

+ (2e-« + l)^^ 2 (^-3 e /W;-(2j+i) 9) (3) 

3=0 

where A = (1 + exp(— q)). If our system can be consid- 
ered a two-state folder we have that the relaxation rate 
fc = kf + k u is the sum of the folding and unfolding rates; 
the latter will be identical at T = T m and, assuming 
an Arrhenius scheme, they will be proportional to the 
folding probability p(S UtV ). So, if the above assumptions 
hold and we have identified the correct barrier, we will 
find the same behavior for lnp(S UtV ) as for lnfc. Notice 
that the expression of p(S UjV ) contains clearly the ACO, 
since r=ACO-l, while the only quantity depending on s 
is the value of f3 m : one finds indeed that the exponential 
dependence of In fc on s is related to the dependence of 
[3 m on s and ACO. Taking this into account in the expres- 
sion oip(S u . v ) yields a dependence on ACO which is com- 
pletely analogous to that reported in Fig. [2j where again 
hip(S U)V ) can be fitted by straight lines whose slopes fol- 
low the law: |Slope| = 1.40 — 5.1 cxp(— 0.70s), (estimated 
variance: 0.0012, correlation coefficient: 0.9980) reason- 
ably close to the value obtained for the rates. 

The above result is very interesting, because it proves 
that an Arrhenius framework can still be applied in this 
case, despite the free-energy profiles may present more 
than one minimum and barrier. But even more, these re- 
sults establish a quantitative connection between the rate 
and the number of strands through the stability stating 
that as the number of strands of the system increases, the 
T m increases asymptotically, implying a global increase 
of the stability, while the equilibration rate decreases. 
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Notice that the dependence of the rate on s found in the 
model is consistent with the reported observation of a L a 
- dependence [HI : indeed if we fit the rate to the number 
of residues L = sr we find for the antiparallcl structures 
In A; = —0.73 x L 64 , which is not far from the exper- 
imental exponent reported in [lit ( a = 0-70) and from 
a = 0.61 coming from off-lattice simulations Q. How- 
ever, an accurate comparison of the dependence of rates 
on length should take into account the frequency of the 
different structures in real proteins, which is out of the 
scope of the present work. 

To conclude, let us review the main findings: resorting 
to a simple statistical model, we have performed a de- 
tailed analysis of the dependence of the relaxation rate 
on some structural indicators, known to correlate with 
protein folding times. To elucidate the role and the inter- 
play of the different factors, we have studied ideal helical, 
parallel and antiparallel secondary structures, of differ- 
ent length and number of strands. Our results confirm 
the absolute contact order ACO as the main structural 
determinant of the rates, but suggest different folding 
mechanisms for parallel and antiparallel structures: the 
latter, which fold faster than the former at equal ACO, 
show a dependence on the number of strands which we 
can relate to the dependence of the denaturation tem- 
perature on the ACO and number of strands. We also 
find that the dependence of the rate on the length of the 
protein is consistent with the power-law reported in (ljj 
for real proteins, despite the fact our ideal structures lack 
the structural and energetic heterogeneity of the latter. 
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